217 research outputs found
Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management
Spreadsheet software is the tool of choice for interactive ad-hoc data
management, with adoption by billions of users. However, spreadsheets are not
scalable, unlike database systems. On the other hand, database systems, while
highly scalable, do not support interactivity as a first-class primitive. We
are developing DataSpread, to holistically integrate spreadsheets as a
front-end interface with databases as a back-end datastore, providing
scalability to spreadsheets, and interactivity to databases, an integration we
term presentational data management (PDM). In this paper, we make a first step
towards this vision: developing a storage engine for PDM, studying how to
flexibly represent spreadsheet data within a database and how to support and
maintain access by position. We first conduct an extensive survey of
spreadsheet use to motivate our functional requirements for a storage engine
for PDM. We develop a natural set of mechanisms for flexibly representing
spreadsheet data and demonstrate that identifying the optimal representation is
NP-Hard; however, we develop an efficient approach to identify the optimal
representation from an important and intuitive subclass of representations. We
extend our mechanisms with positional access mechanisms that don't suffer from
cascading update issues, leading to constant time access and modification
performance. We evaluate these representations on a workload of typical
spreadsheets and spreadsheet operations, providing up to 20% reduction in
storage, and up to 50% reduction in formula evaluation time
A Deep Dive into Blockchain Selfish Mining
This paper studies a fundamental problem regarding the security of blockchain
on how the existence of multiple misbehaving pools influences the profitability
of selfish mining. Each selfish miner maintains a private chain and makes it
public opportunistically for the purpose of acquiring more rewards
incommensurate to his Hashrate. We establish a novel Markov chain model to
characterize all the state transitions of public and private chains. The
minimum requirement of Hashrate together with the minimum delay of being
profitable is derived in close-form. The former reduces to 21.48% with the
symmetric selfish miners, while their competition with asymmetric Hashrates
puts forward a higher requirement of the profitable threshold. The profitable
delay increases with the decrease of the Hashrate of selfish miners, making the
mining pools more cautious on performing selfish mining.Comment: 6 pages, 13 figure
A scalable direct manipulation engine for position-aware presentational data management
With the explosion of data, large datasets become more common for data analysis. How- ever, existing analytic tools are lack of scalability and large-scale data management tools are lack of interactivity. A lot of data analysis tasks are based on the order of data, we are proposing the very first positional storage engine supporting persistence and maintenance of orders for large datasets and allow direct manipulation on orders. We introduce a sparse monotonic order statistic structure for persisting and maintaining order. We also show how to support multiple orders and optimize the storage. After that, we demonstrate a buffered storage manager to ensure the direct manipulation interactivity. Last, we show our final system DataSpread which is interactive and scalable. In the end, we hope that our solution can point out a potential direction to support data analysis for large-scale data
DataSpread: Unifying Databases and Spreadsheets.
Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current pane (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases
How Service Guarantee Induces Customer Opportunism Behavior in Online Environment —The Moderating Role of Customers\u27 Personal Characteristics and Reference Group’s Relationship Strength
On the internet, the enterprise provides service guarantee, such as return without reason in seven days , to reduce the perceived risk of online customers effectively. Meanwhile, such service guarantee leads some customer opportunistic behavior. Taking the customers\u27 personal characteristics and reference group’s relationship strength as moderator variables, we conduct an empirical research to study the major factor and it’s effect paths on customer opportunistic behavior by using the scenario role-playing approach. The result shows that higher service guarantee is more likely to induce customer opportunism behavior. And customers’ personality (Machiavellianism) has nothing to do with the relationship. On the contrary, the relationship strength has a significantly moderating role in the impact of service guarantee strength on customers’ opportunistic behavior. Knowing friends of strong relationship have opportunistic behaviors, customer is more likely to choose the similar behavior when they face the higher service guarantee
Stability Based Generalization Bounds for Exponential Family Langevin Dynamics
Recent years have seen advances in generalization bounds for noisy stochastic
algorithms, especially stochastic gradient Langevin dynamics (SGLD) based on
stability (Mou et al., 2018; Li et al., 2020) and information theoretic
approaches (Xu and Raginsky, 2017; Negrea et al., 2019; Steinke and
Zakynthinou, 2020). In this paper, we unify and substantially generalize
stability based generalization bounds and make three technical contributions.
First, we bound the generalization error in terms of expected (not uniform)
stability which arguably leads to quantitatively sharper bounds. Second, as our
main contribution, we introduce Exponential Family Langevin Dynamics (EFLD), a
substantial generalization of SGLD, which includes noisy versions of Sign-SGD
and quantized SGD as special cases. We establish data-dependent expected
stability based generalization bounds for any EFLD algorithm with a O(1/n)
sample dependence and dependence on gradient discrepancy rather than the norm
of gradients, yielding significantly sharper bounds. Third, we establish
optimization guarantees for special cases of EFLD. Further, empirical results
on benchmarks illustrate that our bounds are non-vacuous, quantitatively
sharper than existing bounds, and behave correctly under noisy labels
- …